Characterizing Discontinuity in Constituent Treebanks
نویسندگان
چکیده
Measures for the degree of non-projectivity of dependency grammar have received attention both on the formal and on the empirical side. The empirical characterization of discontinuity in constituent treebanks annotated with crossing branches has nevertheless been neglected so far. In this paper, we present two measures for the characterization of both the discontinuity of constituent structures and the non-projectivity of dependency structures. An empirical evaluation on German data as well as an investigation of the relation between our measures and grammars extracted from treebanks shows their relevance.
منابع مشابه
Language Independent Dependency to Constituent Tree Conversion
We present a dependency to constituent tree conversion technique that aims to improve constituent parsing accuracies by leveraging dependency treebanks available in a wide variety in many languages. The technique works in two steps. First, a partial constituent tree is derived from a dependency tree with a very simple deterministic algorithm that is both language and dependency type independent...
متن کاملSynchronous Rewriting in Treebanks
Several formalisms have been proposed for modeling trees with discontinuous phrases. Some of these formalisms allow for synchronous rewriting. However, it is unclear whether synchronous rewriting is a necessary feature. This is an important question, since synchronous rewriting greatly increases parsing complexity. We present a characterization of recursive synchronous rewriting in constituent ...
متن کاملLarge aligned treebanks for syntax-based machine translation
We present a collection of parallel treebanks that have been automatically aligned on both the terminal and the nonterminal constituent level for use in syntax-based machine translation. We describe how they were constructed and applied to a syntaxand example-based machine translation system called Parse and Corpus-Based Machine Translation (PaCo-MT). For the language pair Dutch to English, we ...
متن کاملKorean Treebank Transformation for Parser Training
Korean is a morphologically rich language in which grammatical functions are marked by inflections and affixes, and they can indicate grammatical relations such as subject, object, predicate, etc. A Korean sentence could be thought as a sequence of eojeols. An eojeol is a word or its variant word form agglutinated with grammatical affixes, and eojeols are separated by white space as in English ...
متن کاملAlignment Tools for Parallel Treebanks
This paper reports about our efforts in creating a tri-lingual parallel treebank. The focal points are consistency checking and all aspects of sub-sentential alignment. We discuss the alignment guidelines, the importance of quality checks, and special alignment problems. Then we look at alignment algorithms and alignment visualization tools and we compare our own TreeAligner with other alignmen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009